Colony of NPUs: Scaling the Efficiency of Neural Accelerators
نویسندگان
چکیده
Trading small amounts of output accuracy to obtain significant performance improvement or reduce power consumption is a promising research direction. Many modern applications from different domains such as image and signal processing, computer vision, data mining, machine learning and speech recognition are amenable to approximate computing. In many cases, their output is interpreted by humans, so infrequent variation in outputs are not noticeable by users. Moreover, the input of these applications is noisy which prevents strict constraints on the output of the system. Hence, these domains have significant potential for the aforementioned trade-offs, and these applications are ideal targets for approximate computing. Existing approaches for approximate computing include modifying the ISA [6], compiler [1], programming language [2, 4], underlying hardware [14] or the entire framework [3, 8, 10]. Approximation accelerators [5, 7, 13] use some of these methods to trade off accuracy for higher performance or energy savings. These accelerators require the programmer to annotate code sections which are amenable for approximation. Approximate parts are offloaded to the accelerator and other parts are executed by the CPU at run time. Esmaeilzadeh et. al. [7] proposed a Neural Processing Unit (NPU) as a programmable approximate accelerator. The key idea is to train a neural network (nn) to mimic an approximable region of original code and replace that with an efficient computation of the learned model. Although the proposed approximate accelerators such as NPU demonstrate acceptable results on the benchmarks from different domains, they have some shortcomings. Firstly, different invocations of a program might produce varying output qualities because the output quality is dependent on the input values. Consequently, using an NPU with fixed configuration for a wide range of inputs may produce outputs with poor accuracies. On the other hand, if the output quality drops below a determined threshold, one way to improve the quality is re-executing the whole program on the exact hardware. However, the overhead of this recovery process may offset the gains of approximation. Secondly, existing techniques usually measure the output quality by averaging errors in individual output elements, e.g., pixels in an image. Previous works in approximate computing [9, 11, 12] show that although most of the output elements have small errors, there exist a few output elements that have considerably large errors which may degrade the whole user experience. Therefore, some inputs might need more specialized methods to deal with the issue of output quality. Lastly, it is challenging to tune the output quality of an approximate hardware dynamically. If different NPUs with different configurations are available in the system, defining the number of active NPUs per invocation will be a reasonable knob for changing output quality based on users preferences. NPU Output Input Monolithic NPU
منابع مشابه
Towards Neural Acceleration for General- Purpose Approximate Computing
Energy efficiency is becoming crucial to realizing the benefits of technology scaling. We introduce a new class of low-power accelerators called Neural Processing Units (NPUs). Instead of being programmed, NPUs learn to behave like general-purpose code written in an imperative language. After a training phase, NPUs mimic the original code with acceptable accuracy. We describe an NPU-augmented a...
متن کاملAxBench: A Benchmark Suite for Approximate Computing Across the System Stack
As the end of Dennard scaling looms, both the semiconductor industry and the research community are exploring for innovative solutions that allow energy efficiency and performance to continue to scale. Approximation computing has become one of the viable techniques to perpetuate the historical improvements in the computing landscape. As approximate computing attracts more attention in the commu...
متن کاملEstimation of Total Organic Carbon from well logs and seismic sections via neural network and ant colony optimization approach: a case study from the Mansuri oil field, SW Iran
In this paper, 2D seismic data and petrophysical logs of the Pabdeh Formation from four wells of the Mansuri oil field are utilized. ΔLog R method was used to generate a continuous TOC log from petrophysical data. The calculated TOC values by ΔLog R method, used for a multi-attribute seismic analysis. In this study, seismic inversion was performed based on neural networks algorithm and the resu...
متن کاملMATIC: Adaptation and In-situ Canaries for Energy-Efficient Neural Network Acceleration
We present MATIC (Memory Adaptive Training with In-situ Canaries), a voltage scaling methodology that addresses the SRAM efficiency bottleneck in DNN accelerators. To overscale DNN weight SRAMs, MATIC combines the characteristics of destructive SRAM reads with the error resilience of neural networks in a memory-adaptive training process. PVT-related voltage margins are eliminated using bit-cell...
متن کاملApproximate Accelera on for a Post Mul core Era
Approximate Accelera on for a Post Mul core Era Hadi Esmaeilzadeh Co-Chairs of the Supervisory Commi ee: Professor Doug Burger Microso Research Associate Professor Luis Ceze University of Washington Star ng in 2004, the microprocessor industry has shi ed to mul core scaling—increasing the number of cores per die each technology genera on—as its principal strategy for con nuing performance growt...
متن کامل